Method and system for label table caching in a routing device

ABSTRACT

An interconnect fabric module (“IFM”) with high-speed switching capabilities. An interconnect fabric module can be dynamically configured to interconnect its communications ports so that data can be transmitted through the interconnected ports. Multiple interconnect fabric modules can be connected to form an interconnect fabric through which nodes (e.g., computer systems) can be interconnected. In one embodiment, data is transmitted through the interconnect fabric as frames such as those defined by the Fibre Channel and InfiniBand standards. The interconnect fabric module allows the creation of an interconnect fabric that is especially well suited for interconnecting devices utilizing multiple information types such as might be required by the devices of an enterprise data network (“EDN”).

CROSS-REFERENCE TO RELATED APPLICATION(S)

[0001] This application claims the benefit of U.S. ProvisionalApplication No. 60/287,069 entitled “METHOD FOR IMPLEMENTING A CLUSTERNETWORK FOR HIGH PERFORMANCE AND HIGH AVAILABILITY USING A FIBRE CHANNELSWITCH FABRIC,” filed Apr. 27, 2001; U.S. Provisional Application No.60/287,120 entitled “MULTI-PROTOCOL NETWORK FOR ENTERPRISE DATACENTERS,” filed Apr. 27, 2001; U.S. Provisional Application No.60/286,918 entitled “UNIFIED ENTERPRISE NETWORK SWITCH (UNEX) PRODUCTSPECIFICATION,” filed Apr. 27, 2001; U.S. Provisional Application No.60/286,922 entitled “QUALITY OF SERVICE EXAMPLE,” filed Apr. 27, 2001;U.S. Provisional Application No. 60/287,081 entitled “COMMUNICATIONSMODEL,” filed Apr. 27, 2001; U.S. Provisional Application No. 60/287,075entitled “UNIFORM ENTERPRISE NETWORK SYSTEM,” filed Apr. 27, 2001; U.S.Provisional Application No. 60/314,088 entitled “INTERCONNECT FABRICMODULE,” filed Aug. 21, 2001; U.S. Provisional Application No.60/314,287 entitled “INTEGRATED ANALYSIS OF INCOMING DATATRANSMISSIONS,” filed Aug. 22, 2001; U.S. Provisional Application No.60/314,158 entitled “USING VIRTUAL IDENTIFIERS TO ROUTE TRANSMITTED DATATHROUGH A NETWORK,” filed Aug. 21, 2001, and is related to U.S. patentapplication Ser. No. ______ entitled “METHOD AND SYSTEM FOR VIRTUALADDRESSING IN A COMMUNICATIONS NETWORK,” (Attorney Docket No.030048019US1); U.S. patent application Ser. No. ______ entitled “METHODAND SYSTEM FOR LABEL TABLE CACHING IN A ROUTING DEVICE,” (AttorneyDocket No. 030048024US); U.S. patent application Ser. No. ______entitled “METHOD AND SYSTEM FOR MULTIFRAME BUFFERING IN A ROUTINGDEVICE,” (Attorney Docket No. 030048025US); U.S. patent application Ser.No. ______ entitled “METHOD AND SYSTEM FOR DOMAIN ADDRESSING IN ACOMMUNICATIONS NETWORK,” (Attorney Docket No. 030048026US); U.S. patentapplication Ser. No. ______ entitled “METHOD AND SYSTEM FOR INTERSWITCHLOAD BALANCING IN A COMMUNICATIONS NETWORK,” (Attorney Docket No.030048027US); U.S. patent application No. ______ entitled “METHOD ANDSYSTEM FOR INTERSWITCH DEADLOCK AVOIDANCE IN A COMMUNICATIONS NETWORK,”(Attorney Docket No. 030048028US); U.S. patent application Ser. No.______ entitled “METHOD AND SYSTEM FOR CONNECTION PREEMPTION IN ACOMMUNICATIONS NETWORK,” (Attorney Docket No. 030048029US); U.S. patentapplication Ser. No. ______ entitled “METHOD AND SYSTEM FOR MULTICASTINGIN A ROUTING DEVICE,” (Attorney Docket No. 030048030US); U.S. patentapplication Ser. No. ______ entitled “METHOD AND SYSTEM FOR NETWORKCONFIGURATION DISCOVERY IN A NETWORK MANAGER,” (Attorney Docket No.030048032US); U.S. patent application Ser. No. ______ entitled “METHODAND SYSTEM FOR PATH BUILDING IN A COMMUNICATIONS NETWORK,” (AttorneyDocket No. 030048033US); U.S. patent application Ser. No. ______entitled “METHOD AND SYSTEM FOR RESERVED ADDRESSING IN A COMMUNICATIONSNETWORK,” (Attorney Docket No. 030048035US); U.S. patent applicationSer. No. ______ entitled “METHOD AND SYSTEM FOR RECONFIGURING A PATH INA COMMUNICATIONS NETWORK,” (Attorney Docket No. 030048036US1); U.S.patent Application No. ______ entitled “METHOD AND SYSTEM FORADMINISTRATIVE PORTS IN A ROUTING DEVICE,” (Attorney Docket No.030048037US); U.S. patent application Ser. No. ______ entitled “PARALLELANALYSIS OF INCOMING DATA TRANSMISSIONS,” (Attorney Docket No.030048038US); U.S. patent application Ser. No. ______ entitled“INTEGRATED ANALYSIS OF INCOMING DATA TRANSMISSIONS,” (Attorney DocketNo. 030048039US); U.S. patent application Ser. No. ______ entitled“USING VIRTUAL IDENTIFIERS TO ROUTE TRANSMITTED DATA THROUGH A NETWORK,”(Attorney Docket No. 030048040US); U.S. patent application Ser. No.______ entitled “USING VIRTUAL IDENTIFIERS TO PROCESS RECEIVED DATAROUTED THROUGH A NETWORK,” (Attorney Docket No. 030048041US); U.S.patent application Ser. No. ______ entitled “METHOD AND SYSTEM FORPERFORMING SECURITY VIA VIRTUAL ADDRESSING IN A COMMUNICATIONS NETWORK,”(Attorney Docket No. 030048042US); and U.S. patent application Ser. No.______ entitled “METHOD AND SYSTEM FOR PERFORMING SECURITY VIADE-REGISTRATION IN A COMMUNICATIONS NETWORK” (Attorney Docket No.030048043US), which are all hereby incorporated by reference in theirentirety.

TECHNICAL FIELD

[0002] The described technology relates to network switches.

BACKGROUND

[0003] The Internet has emerged as a critical commerce andcommunications platform for businesses and consumers worldwide. Thedramatic growth in the number of Internet users, coupled with theincreased availability of powerful new tools and equipment that enablethe development, processing, and distribution of data across theInternet have led to a proliferation of Internet-based applications.These applications include e-commerce, e-mail, electronic filetransfers, and online interactive applications. As the number of usersof, and uses for, the Internet increases so does the complexity andvolume of Internet traffic. According to UUNet, Internet traffic doublesevery 100 days. Because of this traffic and its business potential, agrowing number of companies are building businesses around the Internetand developing mission-critical business applications to be provided bythe Internet.

[0004] Existing enterprise data networks (“EDNs”) that supporte-commerce applications providing services to customers are strainingunder the demand to provide added performance and added services. Thegrowing customer demands for services, along with a highly competitivemarket, has resulted in increasingly complex ad hoc EDNs. Affordable,high-performance EDN solutions require extensive scalability, very highavailability, and ease of management. These attributes are significantlycompromised or completely lost as existing solutions are grown to meetthe demand.

[0005] Current architectures of EDNs typically include threesub-networks: 1) a local area network (LAN) for web and databaseservers, 2) a computational network for application servers, and 3) astorage area network (SAN). The processing and storage elements attachedto these sub-networks may have access to a wide area network (WAN) ormetropolitan area network (MAN) through a bridging device commonly knownas an edge switch. Each of these sub-networks typically uses a distinctprotocol and associated set of hardware and software including networkinterface adapters, network switches, network operating systems, andmanagement applications. Communication through the EDN requires bridgingbetween the sub-networks that requires active participation of serverprocessing resources for protocol translation and interpretation.

[0006] There are many disadvantages to the current architecture of EDNs.The disadvantages result primarily because the multi-tiered architectureis fractured and complex. First, it is very difficult to integrate thedisparate systems that use different communications protocols,interfaces, and so on. Second, overall performance suffers because eachsub-network is managed separately, rather than being managed withcomprehensive knowledge of the complete network. Third, the cost ofmaintaining three disparate types of network hardware and software canbe high. Fourth, it is difficult to scale an architecture that uses suchdisparate systems. It would be desirable to have an architecture forEDNs that would be alleviate the many disadvantages of the currentfractured multi-tiered architectures.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007]FIG. 1 is a block diagram illustrating components of theinterconnect fabric module (“IFM”) in one embodiment.

[0008]FIG. 2 is a block diagram illustrating components of a switchprotocol controller in one embodiment.

[0009]FIG. 3 is a block diagram illustrating the contents of a labeltable in one embodiment.

[0010]FIG. 4 is a block diagram illustrating the format of a frame inone embodiment.

[0011]FIG. 5 is a diagram illustrating logic of an arbitrator of aswitch protocol controller in one embodiment.

[0012]FIG. 6 is a block diagram illustrating the transmit controller inone embodiment.

[0013]FIG. 7 is a block diagram illustrating the interconnection ofinterconnect fabric modules forming an interconnect fabric that connectsvarious nodes.

[0014]FIG. 8 is a block diagram illustrating the mapping of adestination identifier to a port map.

[0015]FIG. 9 is a block diagram illustrating switch protocol controllercaching in one embodiment.

[0016]FIG. 10 is a block diagram illustrating multiframe buffering.

[0017]FIG. 11 is a diagram illustrating the logic of the bufferarbitrator in one embodiment.

[0018]FIG. 12 is a block diagram illustrating an interconnect fabricconfiguration with multiple direct links between interconnect fabricmodules.

[0019]FIG. 13 is a block diagram illustrating the use of equivalentports.

[0020]FIG. 14 is a diagram illustrating the logic of the equivalent portservice in one embodiment.

[0021]FIG. 15 is a block diagram illustrating a component foridentifying upper layer protocol ports.

[0022]FIG. 16 is a block diagram illustrating an interswitch deadlock.

[0023]FIG. 17 is a diagram illustrating the logic of deadlock avoidancealgorithm in one embodiment.

[0024]FIG. 18 illustrates the preempting of a connection.

[0025]FIG. 19 is a diagram illustrating the logic of processing apreemption signal in one embodiment.

[0026]FIG. 20 is a diagram illustrating the logic of distributed class 3multicasting in one embodiment.

DETAILED DESCRIPTION

[0027] An interconnect fabric module (“IFM”) with high-speed switchingcapabilities is provided. In one embodiment, an interconnect fabricmodule can be dynamically configured to interconnect its communicationsports so that data can be transmitted through the interconnected ports.Multiple interconnect fabric modules can be connected to form aninterconnect fabric through which nodes (e.g., computer systems) can beinterconnected. In one embodiment, data is transmitted through theinterconnect fabric as frames such as those defined by the Fibre Channelstandard. Fibre Channel is defined in ANSI T11 FC-PH, FC-PH-2, FC-PH-3,FC-PI, and FC-FS industry standard documents which are herebyincorporated by reference. One skilled in the art will appreciate,however, that the described techniques can be used with communicationsstandards other than Fibre Channel. In particular, the describedtechniques can be used with the InfiniBand standard, which is describedin the InfiniBand Architecture Specification, Vols. 1-2, Release 1.0,Oct. 24, 2000, which is hereby incorporated by reference. As will bedescribed below in more detail, the interconnect fabric module allowsthe creation of an interconnect fabric that is especially well suitedfor interconnecting devices utilizing multiple information types such asmight be required by the devices of an enterprise data network (“EDN”).

[0028] The interconnect fabric modules use a virtual addressingtechnique to identify source and destination devices (e.g., anotherinterconnect fabric module or a node). To send data from one node toanother, initially the source node may register with a network managerof the interconnect fabric so that a communications path can beestablished between the source node and the destination node. Thenetwork manager selects source and destination virtual addresses to beused by the source and destination nodes when sending frames to eachother. The network manager also identifies a path through theinterconnect fabric modules and their ports through which frames will besent between the nodes. The network manager then configures theinterconnect fabric modules of the identified path so that when a frameis received at an interconnect fabric module that indicates thedestination virtual addresses, that frame is forwarded to thedestination nodes via the path. The network manager need only configurethe interconnect fabric modules once for the path to be available to thenodes. The interconnect fabric modules may maintain a virtual addresstable for each of its ports that maps virtual addresses to itsdestinations ports. When a frame is received at a source port, theinterconnect fabric module uses the virtual address of that frame andthe virtual address table for the source port to identify a destinationport through which the frame is to be forwarded. A virtual address,thus, identifies a path between devices, rather than identifying asource or a destination device. The use of virtual addresses allows thenetwork manager the flexibility to dynamically change paths to meet theoverall system needs. For example, if one interconnect fabric module ona path fails, the network manager may reconfigure the interconnectfabric modules to change a path to avoid the failed interconnect fabricmodule transparent to the source and destination nodes. Also, ifmultiple destination nodes provide the same functionality, then thenetwork manager may implement node load balancing by changing a path sothat data will be sent to a different destination node. The use of thesevirtual addresses allows the changes to be made without changing thesource and destination virtual addresses of the path.

[0029] In one embodiment, a virtual address is part of a virtualidentifier (e.g., stored as source or destination identifier in a frame)that includes a domain address. A destination identifier thus comprisesa domain address and a virtual address. The destination identifiers ofthe frames received by the interconnect fabric modules are used toforward the frame. Each interconnect fabric module is assigned a domainaddress. The interconnect fabric modules that are assigned the samedomain address are in the same domain. The interconnect fabric modulesuse of the domain addresses to forward packets between domains. Thenetwork manager may configure the interconnect fabric modules withinter-domain paths. When an interconnect fabric module receives a framewith a destination domain address that matches its domain address, thenthe frame has arrived at its destination domain. The interconnect fabricmodule then forwards the frame in accordance with the destinationvirtual address since it has arrived at its destination domain. If,however, the domain addresses do not match, then the frame has notarrived at its destination domain. The interconnect fabric moduleforwards the frame using an inter-domain path. Each port of aninterconnect fabric module may have a domain address table (configuredby the network manager) that maps the domain addresses to thedestination port through which frames with that domain address are to beforwarded. Thus, an interconnect fabric module may selectively usevirtual addresses and domain addresses when forwarding frames.

[0030] In one embodiment, an interconnect fabric module may implementvirtual address tables (or domain address tables) using a cachingmechanism. Each port of an interconnect fabric module may have its owncache of mappings from virtual addresses to destination ports. When aframe is received at a source port, the interconnect fabric modulechecks the cache of that source port to determine whether it has amapping for the destination virtual address of that frame. If not, theinterconnect fabric module checks a virtual address table that is sharedby multiple ports. When the virtual address table has a mapping for thedestination virtual address, then the interconnect fabric moduleforwards the frame in accordance with that mapping. The interconnectfabric module also stores that mapping in the cache for the source portso that that mapping can be retrieved more quickly when a subsequentframe is received at the source port with that destination virtualaddress. In an alternate embodiment, when the virtual address table doesnot have a mapping for the destination virtual address, the interconnectfabric module requests the network manager or an external virtualaddress table to provide the mapping. When that mapping is provided bythe network manager or the external table, the interconnect fabricmodule stores it in the virtual address table. Thus, an interconnectfabric module may implement no caching, two-tiered caching, orthree-tiered caching for virtual addresses (or domain addresses).

[0031] In one embodiment, an interconnect fabric module may implementmultiframe buffering at each port so that frames can be buffered atsource ports before being forwarded to a destination port as required.When a first frame is received at a source port, the interconnect fabricmodule stores that first frame in a first buffer of that source port.When a second frame is received at that source port, the interconnectfabric module stores that second frame in a second buffer of that sourceport. The interconnect fabric module may then identify a priority scorefor the first and second frames. The interconnect fabric module thentransmits the frame with a higher priority score first. In this way, theinterconnect fabric module provides both multiframe buffering for sourceports and priority selection of the buffered frames.

[0032] In one embodiment, an interconnect fabric module may implementinterswitch load balancing via groups of equivalent ports. As discussedabove, interconnect fabric modules may themselves be interconnected toform a interconnect fabric for connecting nodes. Two interconnect fabricmodules may have multiple links directly connecting their ports. Portsare considered equivalent when a frame can be selectively transmitted onany of the ports to reach its final destination. The use of multiplelinks (and equivalent ports) between interconnect fabric modules allowsfor a greater bandwidth between those interconnect fabric modules. Thenetwork manager may configure each interconnect fabric module toindicate which groups of its ports are equivalent. The interconnectfabric module may have an equivalent ports table that maps each port toits equivalent ports. When the interconnect fabric module receives aframe, it identifies a destination port based on the virtual address (ordomain address) in the frame. If the identified destination port iscurrently in use, then the interconnect fabric module checks theequivalent ports table to determine whether there any equivalent ports.If so, and the equivalent port is not in use, the interconnect fabricmodule forwards the frame through the equivalent port. In this way,interconnect fabric modules can balance their load through the use ofequivalent ports.

[0033] In one embodiment, an interconnect fabric module uses acrosspoint switch to switch connect its source and destination ports.When the crosspoint switch has more switch ports than ports of theinterconnect fabric module, the extra switch port can be used foradministrative functions of the network manager. When an interconnectfabric module receives a frame directed to a virtual address reservedfor administrative services of the network manager, the interconnectfabric module connects the source port to the extra switch port which isconnected to the network manager. When the frame is transmitted from thesource port, the network manager receives the frame and processes it inaccordance with its administrative functions. In this way,administrative frames can be directly forwarded to the network managerwhen they are first received by an interconnect fabric module from anode.

[0034] In one embodiment, a connection can be established from a sourcenode to a destination node through multiple interconnect fabric modules.In certain circumstances, two directly linked interconnect fabricmodules may encounter a deadlock when both are attempting to establish aconnection using the same link. In such a situation, each interconnectfabric module already has a partially built connection through it andidentifies that a request for a conflicting connection has beenreceived. Each interconnect fabric module then determines whichinterconnect fabric module has the higher priority. If an interconnectfabric module determines that it does not have the higher priority, thenit terminates its partially built connection and allows the conflictingconnection with the higher priority to be built. The interconnect fabricmodule with the higher priority leaves its partially built connectionand indicates that the conflicting connection cannot be established. Bydetecting potential deadlocks at the interconnect fabric module level,overall performance of the interconnect fabric is improved.

[0035] In one embodiment, an interconnect fabric module allows anexisting connection between a source node and a destination node to bepreempted by a request for a proposed connection that specifies a higherpriority and specifies to preempt existing connections. When aninterconnect fabric module receives a connection request at a sourceport, it identifies a destination port. If the destination port iscurrently part of an existing connection and the proposed connectionindicates to preempt, then the interconnect fabric module determineswhether the proposed connection or the existing connection has a higherpriority. If the existing connection has a higher priority, then theinterconnect fabric module indicates that the proposed connection cannotbe made. If, however, the proposed connection has a higher priority,then the interconnect fabric module indicates that the existingconnection is to be terminated and then proceeds to establish theproposed connection. The use of priorities to preempt an existingconnection allows connection management to be distributed through theinterconnect fabric, rather then performed directly by the networkmanager.

[0036] In one embodiment, a device may send a frame that is to bemulticasted to multiple destinations without acknowledgment. The FibreChannel communications standard refers to such a frame as a class 3frames. Such frames are not guaranteed to be received by eachdestination. When an interconnect fabric module receives such a frame,it identifies its destination ports through which the frame is to beforwarded and forwards the frame to each identified destination portthat is not currently in use. If an identified destination port iscurrently in use, the interconnect fabric module keeps the frame storedin the buffer until the identified destination port becomes available oruntil the time to live for the frame expires. When an identifieddestination port becomes available, the interconnect fabric moduleforwards the frame to that destination port. In this way, theinterconnect fabric module increases the chances of the frame to beingsuccessfully received by all of its destinations.

[0037] In the following, aspects of the interconnect fabric module aredescribed using block diagrams and logic diagrams. One skilled in theart will appreciate that the techniques of the interconnect fabricmodule can be implemented using different combinations logic circuitsand/or firmware. In particular, the logic diagrams illustrate processingthat may be performed in parallel using duplicate logic circuits (e.g.,one for each line of a bus) or may be performed in serial using a singlelogic circuit. The particular logic designs can be tailored to meet thecost and performance objectives of the interconnect fabric module. Oneskilled in the art will be able to readily design logic circuits basedon the following descriptions.

[0038] In one embodiment, many different techniques may be used by thenetwork manager, the routing devices and the nodes to ensure thesecurity of the network. In particular, the network manager mayauthenticate each node attempting to register to ensure that the node isnot an imposter node. In this way, only previously authorized nodes canaccess the network. The routing devices may also discard anycommunication that is addressed with a virtual address that is notproperly configured in the routing device. More generally, the routingdevice and nodes may check the header or other information of acommunication to ensure that the communication is valid. If not valid,then the routing device or node can disregard the communication. Forexample, a routing device may detect that a communication received froma node specifies a higher priority than the priority authorized for thenode by the network manager. In such a case, the routing device maydiscard the communication to prevent the node from using a priority thatis higher than authorized. The routing device may also remove itconfigured virtual addresses to prevent use by nodes past an allottedtime period or to prevent use by an imposter node. These securitytechniques can help ensure the overall security of the network and helpprevent some all-to-common security problems, such as adenial-of-service attack. A denial-of-service attack can be preventedbecause an unauthorized node can only send communications through therouting device to which it is directly connected. The routing device candetect that the communication is unauthorized and immediately discardthe communication without attaching the targeted node the network withcommunications sent from the unauthorized node. Moreover, since therouting device that is directly connected to the unauthorized nodehandles the security, the unauthorized communications do not impact thenetwork bandwidth, except possibly for the bandwidth through thedirectly connected routing device.

[0039] In one embodiment, the network manager coordinates networksecurity with the routing devices and the nodes. When a node registerswith the network manager, the network manager authenticates the node.The network manager and the node may use a PKI-based (“Public KeyInfrastructure”) authentication technique. For example, a node maygenerate a private and public key pair. The node then provides itspublic key to the network manager during authorization that may becoordinated by a person who is a network administrator. Once authorized,the node can register with the network manager. To register, the nodeencrypts its registration request (or a portion of it) using its privatekey and then sends the encrypted registration request to the networkmanager. The network manager decrypts the registration request using thenode's public key. If the request is correctly decrypted, then thenetwork manager knows it was sent by an authorized node and proceedswith the registration. If, however, the request is not correctlydecrypted, then the network manager knows that the request was sent byan imposter (or otherwise unauthorized) node and disregards theregistration request. To ensure that a registration request is notintercepted and decrypted by an unauthorized node that has theauthorized node's public key, the network manager may generate its ownprivate and public key pair and provide its public key to the authorizednode. An authorized node can then further encrypt the registrationrequest with the network manager's public key. In this way, only thenetwork manager can decrypt and recognize the registration request. Oneskilled in the art will appreciate that these encryption techniques canbe use to protect any communication sent via the network and not justregistration requests. In addition, various other authenticationtechniques may be used during registration of a node.

[0040] In one embodiment, a routing device filters communications sentfrom a directly connected node so that unauthorized communications arenot transmitted through the network. The routing device may filtercommunications based on information contained in the header of thecommunication. In particular, a source-side port that receives acommunication may discard the communication when the virtual address ofthe communication in not in the label table of the port. In addition,the network manger, when it configures a routing device at noderegistration, may configure the source-side port with filter parametersother than the virtual address. For example, the network manager mayprovide the source-side port with the maximum priority or the classes ofservice that the node is authorized to use. When the port receives acommunication, it determines whether any of the filter parameters areunauthorized and, if so, discards the communication. The routing devicemay also notify the network manager of the unauthorized communication.Because the filtering is performed at the ports, unauthorizedcommunications have minimal impact on overall network performance.

[0041] In one embodiment, the security of the network is furtherenhanced by the removal of virtual addresses from the routing device andfrom the nodes. When a virtual address is removed from a routing deviceor a node, then communications directed to that virtual address will nolonger be accepted by the routing device or node. A virtual address maybe removed for various reasons including when the network managerrequests that it be removed, when a routing device or node detects atimeout for it, and when the routing device or node detects an error atthe physical layer. The network manager may request that a virtualaddress be removed as part of a node's de-registration process. Thede-registration may be initiated by the network manager or by the nodeitself. In either case, the network manager may send a request to removethe virtual address to each source-side port along the path from thesource node to the destination node. The network manager may also send arequest that the node itself remove its virtual address. When a routingdevice or node receives a virtual address, it may automatically removethe virtual address after a certain timeout period. The network managermay specify the timeout period, or the routing device or node may setits own timeout period. The routing device or node may restart thetimeout period whenever a communication is received or sent using thatvirtual address, which results in removal based on when the virtualaddress was last used. The routing device or node may also remove avirtual address when certain events (e.g., errors) are detected at thephysical layer. For example, the physical layer of a routing device maydetect that the communications link between the routing device and anode has been removed (e.g., the line has been unplugged from thesource-side port of the routing device). In such a case, the routingdevice may automatically remove all the virtual addresses associatedwith that node (e.g., stored in the label table of the source-sideport). In this way, an imposter node cannot then be connected to therouting device and start sending communications using the virtualaddresses of the disconnected node. In addition, since the routingdevices are not configured until a node registers (i.e., just-in-timeconfiguration), the length of time that the network is configured tosupport a node tends to be minimized and tends to be on an as-neededbasis. The configuring of the network on an as-needed basis tends toreduce the opportunities an imposter node has to access the network andtends to free up network resource to be used by other authorized nodes.

[0042]FIG. 1 is a block diagram illustrating components of theinterconnect fabric module (“IFM”) in one embodiment. The interconnectfabric module 100 includes 32 switch protocol controllers (“SPC”) 101, acrosspoint switch 102, a switch control unit (“SCU”) 103, a fieldprogrammable gate array (“FPGA”) monitor 104, an arbitration bus 105,and an IFM identifier 106. The interconnect fabric module has 32bi-directional communication ports. A switch protocol controllercontrols each communications port. Each switch protocol controller isresponsible for decoding the header information of a frame, arbitratingaccess to destination ports and configuring the crosspoint switch, andtransmitting the received frame through the crosspoint connections toone or more communication ports. The switch control unit receivesrequests for crosspoint connections from the switch protocolcontrollers, configures the crosspoint switch accordingly, and directsthe switch protocol controllers to transmit their frames through thecrosspoint connections. The crosspoint switch provides full crossbarfunctionality in that each port of the interconnect fabric module can besimultaneously connected to any number of ports. In one embodiment, thecrosspoint switch has 34 inputs and 34 outputs, numbered 0-33. The fieldprogrammable gate array monitor connects to an interconnect fabricmodule manager (not shown), which is a single board computer that mayprovide an interface for configuring the interconnect fabric module andmay provide an interface to upper layer protocol services such as a nameserver or alias server.

[0043]FIG. 2 is a block diagram illustrating components of a switchprotocol controller in one embodiment. The switch protocol controller200 includes a receive controller 201, a decoder 202, a header processor203, a frame buffer 204, a transmit controller 205, and an arbitrator206. The receive controller is connected to the input (i.e., receiveside) of a port and may perform a serial-to-parallel conversion of thereceived frame. The decoder provides the header information of thereceived frame to the header processor and stores the frame in the framebuffer. The header processor includes a processor 207, a label table208, and an equivalent port table 209. The label table contains portmaps that indicate to which ports a frame should be routed (“switchdestination port”) based on the port through which the frame is received(i.e., “switch source port”) and the destination identifier of theframe. The processor retrieves the port map from the label table for thereceived frame and provides the port map to the arbitrator. Theequivalent port table indicates groups of ports that are equivalent inthe sense that a frame can be sent through any port of an equivalentgroup to reach the identified destination. If one port in an equivalentport group is currently in use, then a switch protocol controller canequivalently route the frame to any available port in the equivalentport group. The arbitrators of the switch protocol controllerscoordinate access to the switch control unit so that a switch protocolcontroller can request the switch control unit to configure thecrosspoint switch in accordance with the port map. As described below indetail, the arbitrators and the switch control unit are connected to anarbitration bus. The arbitrator is also connected to the output (i.e.,transmit side) of the port for transmitting control frames. The transmitcontroller transmits frames stored in the frame buffer to the crosspointswitch when the switch control unit indicates that the crosspoint switchhas been configured appropriately.

[0044]FIG. 3 is a block diagram illustrating the contents of a labeltable in one embodiment. The entries of the label table are port mapsthat are indexed by a virtual address. In one embodiment, thedestination identifier includes a domain address and a virtual address,which are described below in detail. A virtual address is virtual in thesense that it is not a physical address of a node (or interconnectfabric module). Rather, a virtual address is mapped to a series ofoutput ports of one or more interconnect fabric modules as specified bytheir label tables that define a route from the source device to thedestination device. A port map has one bit for each of the 32 ports ofthe interconnect fabric module. A bit value of 1 indicates that framesdirected to the indexing virtual address should be routed to thecorresponding port. For example, the first entry in the label tablecontains a bit value of 1 in the column corresponding to port 2 andcontains a bit value of 0 in all the other columns corresponding toports 0, 1, and 3-31. When a frame is directed to the virtual address of0, the corresponding entry in the label table indicates that the frameshould be routed to only port 2. The second entry in the label tableindicates that frames directed to the virtual address of 1 are to berouted to ports 2-31, but not to ports 0 and 1. In one embodiment, thelabel table of each switch protocol controller contains 8K entries. Oneskilled in the art will appreciate that the size of the label table canbe adjusted to meet overall performance goals of the interconnect fabricmodule. Because each switch protocol controller has its own label table,a frame received via port 2 with a virtual address of 5 would be routedin accordance with the port map in the sixth entry of the label tablefor port 2.

[0045]FIG. 4 is a block diagram illustrating the format of a frame inone embodiment. The illustrated frame is in Fibre Channel format. Oneskilled in the art will appreciate that other formats can be used suchas the InfiniBand format. A frame contains a start-of-frame portion, aheader portion, a data portion, and an end-of-frame portion. The headerportion includes a 24-bit destination identifier field, a 24-bit sourceaddress field, an 8-bit control field, an 8-bit type field, and an 8-bitpriority field. The data portion is variable length and contains up to2112 bytes. The destination and source identifiers include a domainaddress and a virtual address. The destination identifier identifies apath from a source device (e.g., node or switch) to one or more devicesto which a frame is to be sent. The source identifier identifies a pathfrom the destination device to the source device. The control fieldindicates whether the frame is a control frame or a data frame. Acontrol frame may include response frames (e.g., an acknowledge frame),fabric control frames, flow control management frames, and link controlframes. The flow control management and link control frames are standardFibre Channel defined frames. The type field indicates the type of datain the data field. A data frame contains payload data that is to be sentfrom one node to another node using the interconnect fabric. The classof a frame specifies whether a frame is to be sent with or without aconnection (e.g., Fibre Channel class 1 a connection withacknowledgment). The class field may indicate a class, a priority value,and a preemption flag. Start-of-connection and end-of-connection framesdelimit a connection. A connection is a bi-directional, physicalconnection from a source node through the interconnect fabric todestination node. When the interconnect fabric receives astart-of-connection frame, the interconnect fabric modules cooperate toestablish a physical connection between the source and destinationnodes. The physical connection is maintained until an end-of-connectionframe is sent via the connection or until a frame that has a priorityhigher than the connection and that designates to preempt conflictingconnections (i.e., its preemptive flag is set) is received by aninterconnect fabric module that needs to use one of its port that isdedicated to the existing connection.

[0046]FIG. 5 is a diagram illustrating logic of an arbitrator of aswitch protocol controller in one embodiment. The arbitratorcommunicates with the switch control unit via the arbitration bus. Thearbitration bus follows the IEEE 896 Futurebus+ arbitration protocol.The arbitration bus is a wired-or bus in which multiple arbitrators candrive their information onto the bus simultaneously. Based on theinformation that is being driven on the arbitration bus, each arbitratordetermines whether it is the arbitrator with the highest priority thatis currently driving the bus. When an arbitrator decides that it doesnot have the highest priority, it stops driving its information onto thebus. Ultimately, the arbitrator with the highest priority will remaindriving the bus. At that point the switch control unit retrieves theinformation from the arbitration bus, which includes the port map forthe destination identifier, the switch source port number, and theclass. The switch control unit then configures the crosspoint switch tocrosspoint connect the input of the switch source port to the output ofeach switch destination port identified by the port map. The switchcontrol unit then notifies the arbitrator with the highest priority thatthe crosspoint switch has been configured. In one embodiment, thearbitration bus includes 32 port status lines to indicate whether thecorresponding port is currently in use. The switch control unit sets andclears the status lines as it configures the crosspoint switch. If theport status lines indicate that the crosspoint switch cannot beconfigured in accordance with the port map (e.g., a port indicated inthe port map is in use), then the arbitrator, in general, does notparticipate in arbitrations until all the switch destination portsindicated by the port map become available. In block 501, the arbitratorraises an arbitration signal on the arbitration bus. If the arbitrationsignal is already raised, then the arbitrator waits until thearbitration signal is lowered before raising the signal. It is possiblethat two arbitrators can raise the arbitration signal simultaneously. Ifso, the arbitrator with the highest priority frame is given control ofthe arbitration bus. In block 502, the arbitrator drives a competitionnumber comprising the 7-bit priority of the frame and the 5-bit portnumber of its port onto the arbitration bus. In decision block 503, ifthe arbitrator does not have the highest priority, then it stops drivingthe competition number and other data onto the arbitration bus in block504 and then continues to block 501 so it can eventually raise thearbitration signal and try again. If the arbitrator has the highestpriority and all the other arbitrators have stopped driving thearbitration bus, then the arbitrator continues at block 505. In block505, the arbitrator drives the port map, its 5-bit port number, andclass onto the arbitration bus. In block 506, the controller stopsdriving any data on the arbitration bus, and then lowers the arbitrationsignal. In block 507, the arbitrator receives confirmation from theswitch control unit when the crosspoint switch has been appropriatelyconfigured. In block 508, the arbitrator signals the transmit controllerto transmit the frame to the crosspoint switch and then completes. Atthat point, other arbitrators detect that the arbitration signal hasbeen lowered and can then arbitrate access to the switch control unit.

[0047]FIG. 6 is a block diagram illustrating the transmit controller inone embodiment. The transmit controller 600 includes a frame generator601, a multiplexor 602, and an encoder 603. The transmit controller whendirected by the arbitrator either generates and transmits a controlframe or transmits the frame currently stored in a frame buffer. Theencoder forwards the frame to the crosspoint switch for transmissionthrough the switch destination ports.

[0048]FIG. 7 is a block diagram illustrating the interconnection ofinterconnect fabric modules forming an interconnect fabric that connectsvarious nodes. In this example, the interconnect fabric modules 701,702, 703, and 704 form a fully connected interconnect fabric. Aninterconnect fabric is fully connected when each interconnect fabricmodule is directly connected to each other interconnect fabric module.For example, interconnect fabric module 701 is directly connected tointerconnect fabric module 702 via link 762, to interconnect fabricmodule 703 via link 763, and to interconnect fabric module 704 via link764. Each interconnect fabric module is also directly connected tovarious nodes. For example, interconnect fabric module 701 is directlyconnected to nodes 710. The ports of a interconnect fabric module thatare directly connected and other interconnect fabric modules arereferred to as expansion ports (“E-ports”), and the ports of aninterconnect fabric module that are connected to nodes are referred toas fabric ports (“F-ports”). FIG. 7 illustrates that a connection hasbeen established between node 711 and node 746. Node 711 is directlyconnected to port 0 of interconnect fabric module 701. Port 0 ofinterconnect fabric module 701 is connected to port 30 via thecrosspoint connection 771. Port 30 of interconnect fabric module 701 isdirectly connected to port 29 of interconnect fabric module 704 via link764. Port 29 of interconnect fabric module 704 is connected to port 16via the crosspoint connection 774. Port 16 of interconnect fabric module704 is directly connected to node 746. While the connection ismaintained, all frames sent from node 711 through port 0 of interconnectfabric module 701 are transmitted through the connection to node 746 viaport 16 of interconnect fabric module 704. When a frame is transmittedusing a connectionless protocol, the crosspoint switches of theinterconnect fabric modules are dynamically configured to route thepacket from the source node to the destination node. That is, once aninterconnect fabric module transmits a frame from its switch source portthrough its switch destination port, those ports are available to bereconnected to other ports. Thus, with a connectionless protocol eachframe will result in an arbitration at each interconnect fabric modulein the path from the source node to the destination node.

Destination Identifier

[0049]FIG. 8 is a block diagram illustrating the mapping of adestination identifier to a port map. Each interconnect fabric modulehas a interconnect fabric module identifier 801. In one embodiment, theinterconnect fabric identifier contains a domain address that has beenassigned to the interconnect fabric module. When a frame is processed bya switch protocol controller, the switch protocol controller determineswhether the domain address of the destination identifier matches thedomain address assigned to the interconnect fabric module. If so, thenthe switch protocol controller uses the virtual address label table toretrieve the port map. (The label table is sub-divided into a virtualaddress label table and a domain address label table.) If the domainaddresses do not match, then the switch protocol controller uses adomain address label table to retrieve the port map. The domain addressof a frame specifies those interconnect fabric modules that areconfigured to route the frame and the domain address label table is usedto route frames to interconnect fabric modules that are configured toroute the frame.

[0050] A switch protocol controller may include a destination identifierbuffer 802, a comparator 805, a domain address label table 806, avirtual address label table 807, and a selector 808. The comparatorinputs are the domain addresses of the interconnect fabric module and ofthe destination identifier. The comparator signals whether the domainaddresses match. The domain address label table is indexed by the domainaddress of the destination identifier and outputs the indexed port map.The virtual address label table is indexed by the virtual address of thedestination identifier and outputs the indexed port map. The port mapsof the domain address label table and the virtual address label tableare input to the selector, which selects a port map based on the inputgenerated by the comparator. That is, the port map is selected from thevirtual address label table when the domain addresses of theinterconnect fabric module and of the destination identifier match andfrom the domain address label table when they do not match.

Label Table Caching

[0051] In one embodiment, multiple switch protocol controllers of aninterconnect fabric module may share a single label table that mayinclude both a virtual address label table and a domain address labeltable. The contents of the label table may be dynamically modified toreflect routing algorithms used by a manager of the interconnect fabric.Each switch protocol controller that shares a single label table mayinclude a local label table cache in which it stores recently retrievedport maps from the shared label table. A switch protocol controllerresolves an address (e.g., virtual address or domain address) into itscorresponding port map, by first checking its local label table cache.If the port map corresponding to that address is not in the local labeltable cache, then the switch protocol controller accesses the sharedlabel table. The use of local label tables and a shared label tablerepresents a two-tier caching system. In one embodiment, the switchprotocol controllers use a three-tier caching system. The third tierprovides access to an extended label table that contains port maps notcurrently contained in the shared label table. Thus, when the sharedlabel table does not contain the port map for an address, a switchprotocol controller uses an extended label table interface to retrieve aport map for that address from a device that is external to theinterconnect fabric module. FIG. 9 is a block diagram illustratingswitch protocol controller caching in one embodiment. In thisembodiment, four switch protocol controllers share a label table. Thefour switch protocol controllers may be contained on a single board orchip referred to as a quad switch protocol controller 900. Switchprotocol controllers 910, 920, 930, and 940 share a single label table950. Each switch protocol controller has a local label table cache suchas local label table cache 911 for switch protocol controller 910. Theextended label table interface 960 provides access to, in oneembodiment, an interconnect fabric module manager that receives requestsfor port maps not currently stored in the shared label table andprovides the requested port maps. Alternatively, the extended labeltable interface provides access directly to an external label table. Theinterconnect fabric module manager may access an overall manager of theinterconnect fabric to retrieve the port maps. One skilled in the artwill appreciate that various well-known caching techniques may be usedto implement the two-tier or three-tier caching system of the switchprotocol controllers.

Multiframe Buffering

[0052] In one embodiment, a switch protocol controller may implementmultiframe buffering of the frames received through its input.Multiframe buffering allows a switch protocol controller to internallystore multiple frames that have not yet been transmitted by the switchprotocol controller. Multiframe buffering allows the device (e.g., nodeor interconnect fabric module) that sends a frame to the switch protocolcontroller to continue sending additional frames as long as a buffer isavailable at the switch protocol controller. In one embodiment, thedevices may use the flow control mechanism of the Fibre Channel standardto coordinate the transmission of frames between devices. A switchprotocol controller may implement a buffer arbitration algorithm toidentify which of the frames in the multiframe buffer should betransmitted by the switch protocol controller. A buffer arbitrator ofthe switch protocol controller may use the priority and class of serviceof the frame to select the next frame to be transmitted. The bufferarbitrator may also factor in the latency of a frame (i.e., length oftime the frame has been stored at the switch protocol controller). Oneskilled in the art would appreciate that many different types of bufferarbitration algorithms may be used, such as algorithms that attempt toensure that each frame is transmitted before it times out or that use afirst-in-first-out approach. Also, the buffer arbitration algorithm maybe loaded at initialization or dynamically after initialization from theinterconnect fabric module manager. In one embodiment, when a bufferarbitrator selects a start-of-connection frame, subsequent frames ofthat connection are automatically selected by the buffer arbitrator.This ensures that frames not associated with a connection are nottransmitted via the connection.

[0053]FIG. 10 is a block diagram illustrating multiframe buffering. Aswitch protocol controller 1000 includes a receive controller 1001, amultiframe buffer 1002, and a buffer arbitrator 1003. The receivecontroller receives frames via the input of its port and stores theframe in the next available buffer of the multiframe buffer. In oneembodiment, the receive controller may store all the frames of aconnection in the same buffer. Alternatively, the received controllermay store frames of a connection in different buffers and the bufferarbitrator ensures that frames of an established connection are giventhe highest priority. The buffer arbitrator is enabled when the switchprotocol controller is ready to process the next frame. FIG. 11 is adiagram illustrating the logic of the buffer arbitrator in oneembodiment. In decision block 1101, if the switch protocol controller iscurrently in a connection, then the buffer arbitrator selects the bufferassociated with the connection and completes. If, however, the switchprotocol controller is not currently in a connection, then the bufferarbitrator calculates a priority score for each frame stored in themultiframe buffer. The buffer arbitrator uses a buffer arbitrationalgorithm to calculate the priority score. In block 1104, the bufferarbitrator selects the buffer containing the frame with the highestpriority score to be processed next and completes.

Interswitch Load Balancing via Groups of Equivalent Ports

[0054] The interconnect fabric modules may be interconnected to provideinterswitch load balancing. For example, two interconnect fabric modulesmay have a multiple direct links between them to increase the bandwidthof frames that may be transmitted between the interconnect fabricmodules. FIG. 12 is a block diagram illustrating an interconnect fabricconfiguration with multiple direct links between interconnect fabricmodules. In this example, interconnect fabric module 1201 andinterconnect fabric module 1202 have three direct links 1210, 1211, and1212 between them. The use of multiple direct links allows multipleframes to be transmitted simultaneously between the directly linkedinterconnect fabric modules. For example, three nodes directly linked tointerconnect fabric module 1201 may simultaneously have connectionsestablished to three different nodes directly linked to interconnectfabric module 1202. In this example, interconnect fabric module 1201 isalso indirectly linked to interconnect fabric module 1202 via links 1213and links 1214, 1215, and 1216 through interconnect fabric module 1204.Ports of an interconnect fabric module are equivalent when they can beuse interchangeably to route frames to their destination. Equivalentports may have similarly configured label tables.

[0055] In one embodiment, each switch protocol controller has anequivalent port table that defines which ports of the interconnectfabric module are logically equivalent to one another. (Alternatively,the switch protocol controllers of an interconnect fabric module mayshare an equivalent port table.) For example, ports 0, 1, and 2 may beequivalent ports for both interconnect fabric module 1201 andinterconnect fabric module 1202. When the header processor selects aport map, an equivalent port service of the switch protocol controllerdetermines whether the ports of the port map are currently available. Ifa port is not currently available, the equivalent port servicedetermines from the equivalent port table whether an equivalent port isavailable. If so, the equivalent port service modifies the port map sothat the frame is routed through the equivalent port. For example, if aport map designates port 0 of interconnect fabric module 1201, but port0 is currently in use, then the equivalent port service may select port1 as an equivalent to replace port 0 in the port map (assuming port 1 isnot currently in use).

[0056]FIG. 13 is a block diagram illustrating the use of equivalentports. Equivalent port service 1303 inputs a port map that may begenerated using virtual address 1301 and virtual address label table1302. Alternatively, the port map may be generated using a domainaddress and a domain address label table. The equivalent port servicealso inputs equivalent port table 1304. The equivalent port tablecontains an entry for each port of the interconnect fabric module. Eachentry, referred to as an equivalent port map, contains a bit for eachport of the interconnect fabric module. In this example, the entry forport 0 has its bits for port 1 and port 2 set to indicate that port 0,port 1, and port 2 are equivalent. The entry for port 1 has its bits forport 0 and port 2 set to indicate that port 0, port 1, and port 2 areequivalent. The equivalent port service also inputs the port statuslines, which indicates the current status of each of the ports of theinterconnect fabric module. When the equivalent port service receives aport map it determines whether the designated ports are available basedon the port status. If a designated port is not available, theequivalent port service retrieves the equivalent port map for thatdesignated port. The equivalent port service then determines whether anyof the equivalent ports are available. If an equivalent port isavailable, then the equivalent port service changes the port map todesignate an available equivalent port. If no equivalent ports areavailable, then the equivalent port service leaves the port mapunchanged. In one embodiment, an equivalent port map may have a priorityassociated with each port. The equivalent port service may selectequivalent ports based on their associated priority. The priorities maybe useful, for example, when ports are equivalent, but the cost ofrouting a frame through the ports are different. For example, port 3 ofinterconnect fabric module 1201 may be equivalent to port 0, port 1, andport 2, but the cost of routing a frame through port 3 may be higherbecause the frame would travel through interconnect fabric module 1204on its way to interconnect fabric module 1202.

[0057]FIG. 14 is a diagram illustrating the logic of the equivalent portservice in one embodiment. The equivalent port service receives an inputport map and processes each designated port of the input port map. Theservice may initialize the output port map so that no ports aredesignated. In block 1401, the service selects of the next designatedport of the input port map. In decision block 1402, if all thedesignated ports have already been selected, then the service completes,else the service continues at block 1403. In decision block 1403, if theselected port is available, then the service designates the selectedport in the output port map and proceeds to select the next designatedport of the input port map. In block 1405, the service retrieves theequivalent port map for the selected port from the equivalent porttable. In block 1406, the service selects the next designated port ofthe selected equivalent port map. In decision block 1407, if alldesignated ports of the equivalent port map have already been selected,then the service continues at block 1408, else the service continues atblock 1409. In block 1408, the service designates the selected port inthe output port map because no equivalent ports are available andcompletes. The service may repeat this process as ports becomeavailable. In decision block 1409, if the selected port of theequivalent port map is available, then the service continues at block1410, else the service loops to block 1406 to select the next designatedport of the equivalent port map. In block 1410, the service designatesthe selected port of the equivalent port map in the output port map andthen loops to select the next designated port of the input port map.

Administrative Ports

[0058] In one embodiment, the crosspoint switch of a switch protocolcontroller may have more outputs than the number of ports of aninterconnect fabric module. For example, a crosspoint switch may have 34inputs and outputs, but the interconnect fabric module may have only 32ports. The switch protocol controller may use these additional ports ofthe crosspoint switch to route upper layer protocol frames, such asframes directed into a name server or other administrative services. Inone embodiment, the additional output ports of the crosspoint switch maybe connected to the interconnect fabric module manager. An interconnectfabric module may have a list of “reserved” addresses that designate anupper layer protocol port. When a switch protocol controller determinesthat an address of its frame matches one of the reserved addresses, itenables the routing of that frame to an upper layer protocol port. Therouting to upper layer protocol ports may use the same arbitrationmechanism as used for routing to non-upper layer protocol ports. Oneskilled in the art will appreciate that the arbitration bus would needlines for supporting the additional ports. For example, six lines wouldbe needed to designate ports 0 through ports 33, rather than the fivelines needed to designate ports 0 through ports 31. Alternatively, whenthe crosspoint switch does not have extra output for an upper layerprotocol port, an output can be selectively switched between acommunications port and an upper layer protocol port depending onwhether the address of the destination identifier is reserved.

[0059]FIG. 15 is a block diagram illustrating a component foridentifying upper layer protocol ports. This component may be part ofthe header processor of the switch protocol controller. Anadministrative port comparator 1503 inputs the virtual address of thedestination identifier 1501 of a frame and a reserved address table1502. The reserved address table has entry for each reserved address andcontains the value of the reserved address and may contain a flagindicating whether to route a frame designating the reserved address toupper layer protocol port 32 or upper layer protocol port 33. Whensignaled, the comparator determines whether any of the reservedaddresses match the virtual address of the frame. If so, the comparatorenables the port 32 or port 33 flags. When enabled, the arbitratorautomatically designates the signaled upper layer protocol port duringarbitration. The crosspoint switch 1506 may be connected to theinterconnect fabric module manager 1507 via upper layer protocol port 32and upper layer protocol port 33. In this way, the upper layer protocolframes are routed to the interconnect fabric module manager as indicatedfor further processing.

Interswitch Deadlock Avoidance

[0060] In one embodiment, the switch protocol controllers implement adeadlock avoidance scheme to prevent interswitch deadlocks. Aninterswitch deadlock may occur when two partially built connections bothneed the same port to complete their connections. FIG. 16 is a blockdiagram illustrating an interswitch deadlock. In this example, node 1605requests that a connection be established to node 1607, and node 1606requests that a connection be established to node 1605. Node 1605 isdirectly linked to port 1 of interconnect fabric module 1601 and nodes1606 and 1607 are directly linked to ports 1 and 2, respectively, ofinterconnect fabric module 1602. Port 0 of interconnect fabric module1601 is directly linked to port 0 of interconnect fabric module 1602.Table 1610 illustrates a sequence of events that results in a deadlock.At time 0, nodes 1605 and 1606 send out start-of-connection frames. Attime 1, the interconnect fabric module 1601 establishes a crosspointconnection between its port 1 and its port 0 via crosspoint switch 1603as part of the process of establishing the connection between node 1605and node 1607. At the same time, interconnect fabric module 1602establishes a crosspoint connection between its port 1 and port 0 viacrosspoint switch 1604 as part of the process of establishing theconnection between node 1606 and node 1605. At time 2, interconnectfabric module 1601 transmits the start-of-connection frame tointerconnect fabric module 1602 via link 1608, and interconnect fabricmodule 1602 transmits the start-of-connection frame to interconnectfabric module 1601 via link 1608. When interconnect fabric module 1601receives the start-of-connection frame, it determines that it cannotcurrently establish a crosspoint connection from port 0 to port 1because port 0 is in use by the partial connection from node 1605 tonode 1607. Similarly, when interconnect fabric module 1602 receives thestart-of-connection frame, it determines that it cannot currentlyestablish a crosspoint connection from port 0 to port 2 because port 0is in use by the partial connection from node 1606 to node 1605. Becauseneither connection can be completed, a deadlock occurs. One skilled inthe art will appreciate that deadlocks can result from a wide variety ofsequences of events.

[0061] In one embodiment, a switch protocol controller uses aninterswitch deadlock avoidance scheme. Whenever a switch protocolcontroller receives a start-of-connection frame and the switch protocolcontroller is currently in a connection, then a conflict has occurred.The switch protocol controller receives such a conflictingstart-of-connection frame when the conflicting start-of-connection framewas initially transmitted from a node before the connection thatincluded that switch protocol controller's port was complete. To avoid adeadlock, once the conflict is detected, the switch protocol controllercompares the priority of the conflicting start-of-connection frame withthe priority of the start-of-connection frame for its partially builtconnection to determine which connection should be established. If theframes have the same priority, then the switch protocol controller usesthe domain address identifier or other unique identifier of theinterconnect frame modules as a tiebreaker, that is the interconnectfabric module that received and the one that sent the conflictingstart-of-message frame. If the priority of the conflicting frame ishigher, then the switch protocol controller sends a frame through itsinput direction indicating that the connection cannot be established andthen proceeds to process the conflicting start-of-connection frame tocomplete the connection. Conversely, the switch protocol controller thatsent the conflicting frame also detects the conflict but determines thatthe frame it sent has a higher priority and ignores thestart-of-connection frame that it just received.

[0062]FIG. 17 is a diagram illustrating the logic of deadlock avoidancealgorithm in one embodiment. The deadlock avoidance algorithm mayprocess each frame that is received by a switch protocol controller. Indecision block 1701, if a start-of-connection frame is received, thenprocessing continues at block 1702, else there is no conflict. Indecision block 1702, if the port is currently in a connection, thenthere is a conflict and the processing continues at block 1703, elsethere is no conflict. In decision block 1703, if the frame thatestablished the connection for this port has a higher priority than theconflicting frame just received, then this port wins the conflict anddiscards the conflicting frame, else the processing continues at block1704. In decision block 1704, if the priorities are equal, theprocessing continues at block 1705 to check the tiebreaker, else thisswitch protocol controller loses the conflict and continues at block1706. In decision block 1705, if the domain address of this interconnectfabric module is greater than the domain address of the interconnectfabric module that sent the conflicting frame, then this port wins theconflict, else this port loses the conflict and continues at block 1706.In block 1706, when this port loses conflict, it removes the partialconnection that has been established through it by sending a removeconnection frame through its input to notify the originating node andthe other interconnect fabric modules through which the connection waspartially built.

Connection Preemption

[0063] In one embodiment, the interconnect fabric modules allow anexisting connection to be preempted when a connection with a higherpriority is to be established that conflicts with the existingconnection. FIG. 18 illustrates the preempting of a connection. Aconnection is established between node 1803 and node 1804. Theconnection includes link 1806, a crosspoint connection between port 0and port 1 of interconnect fabric module 1801, link 1807, a crosspointconnection between port 4 and port 6 of interconnect fabric module 1802,and link 1808. Once the connection has been established, node 1805 maysend a start-of-connection frame with a higher priority than theexisting connection and with its preemption bit (flag) set. When theswitch protocol controller for port 2 receives the frame, it selects theswitch destination port through which the connection is to be built. Theswitch protocol controller may use the equivalent port service toidentify an equivalent port that is available if the port designated bythe port map is in use. If port 1 is designated in the port map andthere is no equivalent port that is available, the switch protocolcontroller for port 2 detects a conflict. The switch protocol controllerthen sets a flag indicating that the conflicting port (i.e., ports 0 or1) should participate in the ensuing arbitration. The switch protocolcontroller for port 2 then sets the arbitration flag and the conflictingport and port 2 participate in the arbitration. If the conflicting portloses, its switch protocol controller sends a frame through itsconnection in both directions indicating that the connection is to beremoved and the switch protocol controller for port 2 establishes acrosspoint connection between port 1 and port 2 and transmits itsstart-of-connection frame. Conversely, if the conflicting port wins thearbitration, then the connection is left established and the switchprotocol controller for port 2 sends a frame indicating that theconnection cannot be established in its input direction.

[0064]FIG. 19 is a diagram illustrating the logic of processing apreemption signal in one embodiment. This processing is performed when aswitch protocol controller detects a preemption signal on thearbitration bus. In decision block 1901, if this port is currently in aconnection, then the switch protocol controller may need to participatein the arbitration. In decision block 1902, if this port is theconflicting port (i.e., established the connection), then this portparticipates in the arbitration and continues processing at block 1903.In block 1903, the switch protocol controller for this port participatesin the arbitration. In decision block 1904, if the switch protocolcontroller of this port loses the arbitration, then it continues atblock 1905 to disconnect the connection, else it leaves the connectionestablished. In block 1905, the switch protocol controller sends adisconnect frame in the direction of its input and output. In block1906, the switch protocol controller indicates to the switch controlunit to remove the crosspoint connection for this port.

Distributed Class 3 Multicasting

[0065] The Fibre Channel standard defines a class 3 protocol thatprovides a connectionless protocol for transmitting frames without anacknowledgment. Because the protocol is connectionless and noacknowledgment is used, the class 3 protocol can be used formulticasting, that is sending a frame from one node to multiple nodes.Class 3 protocol also specifies that frame delivery is not guaranteed.Traditionally, when a Fibre Channel switch receives a class 3 frame formulticasting, it routes that class 3 frame through as many of thedestination ports as are currently available and then discards thatframe. In one embodiment, a switch protocol controller buffers a class 3multicasting frame and sends the frame through the multicast ports asthey become available. Although the timeout of the class 3 frame at theswitch protocol controller may expire before all multicast ports becomeavailable, the buffering of multicast frames increases the chances thatthe frame may be sent through additional multicast ports as they becomeavailable. One skilled in the art will appreciate that multiframebuffering can be used with communications services other than class 3 ofFibre Channel. In particular, it can be used with any non-acknowledgeddata gram service, also referred to as a packet service. One skilled inthe art will appreciate that multiframe buffering can be used tointerleave the transmission of a multicast frame with other frames(e.g., connectionless frames). The multiframe buffering algorithm may,for example, give a highest priority score to the multicast frame onlywhen at least one of the multicast ports is currently available.

[0066]FIG. 20 is a diagram illustrating the logic of distributed class 3multicasting in one embodiment. This logic is performed when a class 3frame with multicasting is received at a switch protocol controller. Inblock 2001, the switch protocol controller identifies the multicastports that are currently available. The multicast ports may be the setof ports indicated by the port map to which a virtual address maps. Inblock 2002, if any of the multicast ports are available, then the switchprotocol controller participates in arbitration. In decision block 2003,if the switch protocol controller wins the arbitration, then itcontinues at block 2004, else it continues at block 2001 to againparticipate in an arbitration. In block 2004, the switch protocolcontroller sends the frame and updates the port map stored in atemporary buffer to reflect those ports through which the frame has beensent. In decision block 2005, if the multicast is complete (i.e., theframe has been transmitted through each multicast port), then processingcompletes, else processing continues to participate in an arbitration tosend the frame as additional ports become available.

[0067] One skilled in the art will appreciate that, although variousembodiments of the technology have been described, various modificationsmay be made without deviating from the spirit and scope of theinvention. For example, aspects of the technology may be used on manydifferent types of routing devices (e.g., switches) other than aninterconnect fabric module as described herein. Accordingly, theinvention is not limited except as by the following claims.

1. A method in a routing device for retrieving an identification of adestination port for data, the data being received through a source portand having an address, the method comprising: when a cache associatedwith the source port has an identification of a port associated with theaddress of the data, retrieving the identification of the port from thecache; and when a cache associated with the source port does not havethe identification of a port associated with the address of the data andwhen a table shared by multiple ports including the source port has theidentification of a port associated with the address of the data,retrieving of the identification of the port from the table.
 2. Themethod of claim 1 including storing the identification of the portretrieved from the table in the cache associated with the source port.3. The method of claim 1 wherein the cache and the table contain portmaps that designate one or more ports.
 4. The method of claim 1 whereinthe address of the data is a virtual address.
 5. The method of claim 1including: when a table shared by multiple ports including the sourceport does not have the identification of a port associated with theaddress of the data, retrieving the identification of the port from asource external to the routing device.
 6. The method of claim 5including storing the identification of the port retrieved from thesource external to the routing device in the table.
 7. The method ofclaim 1 wherein the table is shared by four ports.
 8. The method ofclaim 1 wherein the table is shared by multiple ports.
 9. The method ofclaim 1 wherein each port is associated with its own cache.
 10. Themethod of claim 1 wherein the address is a portion of a Fibre Channelframe.
 11. The method of claim 1 wherein the address is a portion of anInfiniBand frame.
 12. The method of claim 1 wherein the table is avirtual address label table.
 13. The method of claim 1 wherein therouting device is an interconnect fabric module.
 14. The method of claim1 wherein the routing device is Fibre Channel compatible.
 15. The methodof claim 1 wherein the routing device is InfiniBand compatible.
 16. Themethod of claim 1 wherein the address is a domain address.
 17. A routingdevice comprising: a shared collection of mappings of identifiers todestination ports of the routing device; and a plurality of sourceports, each source port having a cache for storing mappings ofidentifiers to destination ports of the routing device; a component thatretrieves an identification of a destination port from the cache whenthe cache has a mapping of an identifier associated with communicationreceived at the source port to a destination port; and a component thatretrieves an identification of a destination port from the sharedcollection when the cache does not have a mapping of the identifierassociated with the communication received at the source port to adestination port.
 18. The routing device of claim 17 wherein thecomponent that retrieves the identification of a destination port fromthe collection stores the identification of the destination portretrieved from the collection in the cache.
 19. The routing device ofclaim 17 wherein the cache and the collection contain port maps thatdesignate one or more ports.
 20. The routing device of claim 17 whereinthe identifier of the communication is a virtual identifier.
 21. Therouting device of claim 17 including a component that retrieves theidentification of the port from a source external to the routing devicewhen the collection does not have a mapping from the identifier of thecommunication to a destination port.
 22. The routing device of claim 21wherein the component that retrieves the identification of the port froma source external to the routing device stores the identification of thedestination port retrieved from the source external to the routingdevice in the collection.
 23. The routing device of claim 17 wherein thecollection is shared by multiple source ports.
 24. The routing device ofclaim 17 wherein the identifier is a portion of a Fibre Channel frame.25. The routing device of claim 17 wherein the identifier is a portionof an InfiniBand frame.
 26. The routing device of claim 17 wherein thecollection is a virtual identifier label table.
 27. The routing deviceof claim 17 wherein the routing device is a switch.
 28. The routingdevice of claim 17 wherein the routing device is an interconnect fabricmodule.
 29. The routing device of claim 17 wherein the routing device isFibre Channel compatible.
 30. The routing device of claim 17 wherein therouting device is InfiniBand compatible.
 31. The routing device of claim17 wherein the address is a domain address.
 32. The routing device ofclaim 17 wherein the address is part of a virtual identifier.
 33. Amethod in a routing device for retrieving an identification of adestination port for a communication, the communication being receivedthrough a source port and having an identifier, the method comprising:when a cache has an identification of a port associated with theidentifier of the communication, retrieving the identification of theport from the cache; and when the cache does not have the identificationof a port associated with the identifier of the communication and when amapping shared by multiple ports including the source port has theidentification of a port associated with the identifier of thecommunication, retrieving of the identification of the port from themapping.
 34. The method of claim 33 including storing the identificationof the port retrieved from the mapping in the cache.
 35. The method ofclaim 33 wherein the cache and the mapping contain port maps thatdesignate one or more ports.
 36. The method of claim 33 wherein theidentifier of the communication is a virtual address.
 37. The method ofclaim 33 including: when the mapping shared by multiple ports includingthe source port does not have the identification of a port associatedwith the address of the communication, retrieving the identification ofthe port from a source external to the routing device.
 38. The method ofclaim 37 including storing the identification of the port retrieved fromthe source external to the routing device in the mapping.
 39. The methodof claim 33 wherein each port is associated with its own cache.
 40. Themethod of claim 33 wherein the identifier is a portion of a FibreChannel frame.
 41. The method of claim 33 wherein the identifier is aportion of an InfiniBand frame.
 42. The method of claim 33 wherein themapping is a label table.
 43. The method of claim 33 wherein the routingdevice is an interconnect fabric module.
 44. The method of claim 33wherein the identifier is a domain address.
 45. A routing devicecomprising: means for mapping identifiers to destination ports in ashared collection; and means for mapping identifiers to destinationports in a cache collection for each of a plurality of ports; means forretrieving an identification of a destination port from the cachecollection when the cache collection has a mapping of an identifierassociated with a communication to a destination port; and means forretrieving an identification of a destination port from the sharedcollection when the cache collection does not have a mapping of theidentifier associated with the communication to a destination port. 46.The routing device of claim 45 wherein the means for retrieving theidentification of a destination port from the shared collection includesmeans for storing a mapping of the identifier to the retrievedidentification of the destination port in the cache collection for thesource port that received the communication.
 46. The routing device ofclaim 45 wherein the cache collection and the shared collection containport maps that designate one or more ports.
 47. The routing device ofclaim 45 wherein the identifier of the communication is a virtualidentifier.
 49. The routing device of claim 45 including means forretrieving the identification of the port from a source external to therouting device when the shared collection does not have a mapping fromthe identifier of the communication to a destination port.
 50. Therouting device of claim 49 wherein the means for retrieving theidentification of the port from a source external to the routing devicestores the identification of the destination port retrieved from thesource external to the routing device in the shared collection.
 51. Therouting device of claim 45 wherein the shared collection is shared bymultiple source ports.
 52. The routing device of claim 45 wherein theidentifier is a portion of a Fibre Channel frame.
 53. The routing deviceof claim 45 wherein the identifier is a portion of an InfiniBand frame.54. The routing device of claim 45 wherein the shared collection is avirtual identifier label table.
 55. The routing device of claim 45wherein the routing device is an interconnect fabric module.
 56. Therouting device of claim 45 wherein the identifier is a domain address.57. The routing device of claim 45 wherein the identifier is part of avirtual identifier.